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Abstract 

The Stackelberg equilibrium is a solution concept that describes optimal strategies to commit to: 
Player 1 (the leader) first commits to a strategy that is publicly announced, then Player 2 (the follower) 
plays a best response to the leader’s choice. We study the problem of computing Stackelberg equilibria in 
finite sequential (i.e., extensive-form) games and provide new exact algorithms, approximate algorithms, 
and hardness results for finding equilibria for several classes of such two-player games. 


1 Introduction 

The Stackelberg competition is a game theoretic model introduced by von Stackelberg li25l for studying 
market structures. The original formulation of a Stackelberg duopoly captures the scenario of two firms 
that compete by selling homogeneous products. One firm —the leader —first decides the quantity to sell and 
announces it publicly, while the second firm —the follower —decides its own production only after observing 
the announcement of the first firm. The leader firm must have commitment power (e.g., is the monopoly in an 
industry) and cannot undo its publicly announced strategy, while the follower firm (e.g., a new competitor) 
plays a best response to the leader’s chosen strategy. 

The Stackelberg competition has been an important model in economics ever since (see, e.g., Il22l IT51 
[0 M DU HQ), while the solution concept of a Stackelberg equilibrium has been studied in a rich body 
of literature in computer science, with a number of important real-world applications developed in the last 
decade l23l . The Stackelberg equilibrium concept can be applied to any game with two players (e.g., in 
normal or extensive form) and stipulates that the leader first commits to a strategy, while the follower ob¬ 
serves the leader’s choice and best responds to it. The leader must have commitment power; in the context 
of firms, the act of moving first in an industry, such as by opening a shop, requires financial investment and 
is evidently a form of commitment. In other scenarios, the leader’s commitment refers to ways of respond¬ 
ing to future events, should certain situations be reached, and in such cases the leader must have a way of 
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enforcing credible threats. The leader can always commit to a Nash equilibrium strategy, however it can 
often obtain a better payoff by choosing some other strategy profile. While there exist generalizations of 
Stackelberg equilibrium to multiple players, the real-world implementations to date have been derived from 
understanding the two player model, and for this reason we will focus on the two-player setting. 

One of the notable applications using the conceptual framework of Stackelberg equilibrium has been the 
development of algorithms for protecting airports and ports in the United States (deployed so far in Boston, 
Los Angeles, New York). More recent ongoing work (see, e.g., l l20l ). explores additional problems such as 
protecting wildlife, forest, and fisheries. The general task of defending valuable resources against attacks can 
be cast in the Stackelberg equilibrium model as follows. The role of the leader is taken by the defender (e.g., 
police forces), who commits to a strategy, such as the allocation of staff members to a patrolling schedule 
of locations to check. The role of the follower is played by a potential attacker, who monitors the empirical 
distribution (or even the entire schedule) of the strategy chosen by the defender, and then best responds, 
by devising an optimal attack given this knowledge. The crucial question is how to minimize the damage 
from potential threats, by computing an optimal schedule for the defender. Solving this problem in practice 
involves several nontrivial steps, such as estimating the payoffs of the participants for the resources involved 
(e.g., the attacker’s reward for destroying a section of an aiiport) and computing the optimal strategy that 
the defender should commit to. 

In this paper, we are interested in the following fundamental question: 

Given the description of a game in extensive form, compute the optimal strategy that the leader 
should commit to. 

We study this problem for multiple classes of two-player extensive-form games (EFGs) and variants of 
the Stackelberg solution concept that differ in kinds of strategies to commit to, and provide both efficient 
algorithms and computational hardness results. We emphasize the positive results in the main text of the 
submission and fully state technical hardness results in the appendix. 

1.1 Our Results 

The problem of computing a Stackelberg equilibrium in EFGs can be classified by the following parameters: 

• Information. Information captures how much a player knows about the opponent’s moves (past and 
present). We study turn-based games (TB), where for each state there is a unique player that can 
perform an action, and concurrent-move games (CM), where the players act simultaneously in at least 
one state. 

• Chance. A game with chance nodes allows stochastic transitions between states; otherwise, the tran¬ 
sitions are deterministic (made through actions of the players). 

• Graph. We focus on trees and directed acyclic graphs (DAGs) as the main representations. Given such 
a graph, each node represents a different state in the game, while the edges represent the transitions 
between states. 

• Strategies. We study several major types of strategies that the leader can commit to, namely pure (P), 
behavioral (B), and correlated behavioral (C). 
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Table 1: Overview of the computational complexity results containing both existing and new results provided 
by this paper (marked with *). Information column: TB stands for turn-based and CM for concurrent moves. 
Strategies: P stands for pure, B for behavioral, and C for correlated. Finally, |S| denotes the number of 
decision points in the game and \Z\ the number of terminal states. 



Information 

Chance 

Graph 

Strategies 

Complexity 

Source 

1 .* 

TB 

X 

DAG 

P 

O (|<S| • (|«S| + \2\)) 

Theorem 


2 . 

TB 

X 

Tree 

B 

oosi-m 

m 

3.* 

TB 

X 

Tree 

C 

0(\s\-\z\) 

Theorem i 


4. 

TB 

/ 

Tree 

B 

NP-hard 

m 

5.* 

TB 

/ 

Tree 

P 

FPTAS 

Theorem 

7 

6* 

TB 

/ 

Tree 

B 

FPTAS 

Theorem 


7 * 

TB 

/ 

Tree 

C 

o(\s\-\z\) 

Theorem. 

? 

8 .* 

CM 

X 

Tree 

B 

NP-hard 

Theorem z 

1 

9.* 

CM 

/ 

Tree 

C 

polynomial 

Theorem . 



The results are summarized in Tableland can be divided in three categoric^] 

First, we design a more efficient algorithm for computing optimal strategies for turn-based games on 
DAGs. Compared to the previous state of the art (due to Letchford and Conitzer fl7l . l IThl ). we reduce the 
complexity by a factor proportional to the number of terminal states (see row 1 in Table [T]). 

Second, we show that correlation often reduces the computational complexity of finding optimal strate¬ 
gies. In particular, we design several new polynomial time algorithms for computing the optimal correlated 
strategy to commit to for both turn-based and concurrent-move games (see rows 3, 7, 9). 

Third, we study approximation algorithms for the NP-hard problems in this framework and provide 
fully polynomial time approximation schemes for finding pure and behavioral Stackelberg equilibria for 
turn-based games on trees with chance nodes (see rows 5, 6 ). We leave open the question of finding an 
approximation for concurrent-move games on trees without chance nodes (see row 8 ). 

1.2 Related Work 

There is a rich body of literature studying the problem of computing Stackelberg equilibria. The compu¬ 
tational complexity of the problem is known for one-shot games Q, Bayesian games 0, and selected 
subclasses of extensive-form games ifTTll and infinite stochastic games lH8lfl3lH4l . Similarly, many practi¬ 
cal algorithms are also known and typically based on solving multiple linear programs Q, or mixed-integer 
linear programs for Bayesian lf2TTl and extensive-form games | 2 |. 

For one-shot games, the problem of computing a Stackelberg equilibrium is polynomial Q in contrast to 
the PPAD-completeness of a Nash equilibrium HJO. The situation changes in extensive-form games where 
Letchford and Conitzer showed ffTTl that for many cases the problem is NP-hard, while it still remains PPAD- 
complete for a Nash equilibrium |[ 8 ]. More specifically, computing Stackelberg equilibria is polynomial only 

*We stated a theorem for NP-hardness for the correlated case on DAGs that was similar to the original theorem for behavioral 
strategies 03 in an earlier version of this paper. Due to an error, the theorem has not been correctly proven and the computational 
complexity for this case (i.e., computing optimal correlated strategies to commit to on DAGs) remains currently open. 
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for: 


• games with perfect information with no chance on DAGs where the leader commits to a pure strategy, 

• games with perfect information with no chance on trees. 

Introducing chance or imperfect information leads to NP-hardness. However, several cases were unexplored 
by the existing work, namely extensive-form games with perfect information and concurrent moves. We 
address this subclass in this work. 

The computational complexity can also change when the leader commits to correlated strategies. This 
extension of the Stackelberg notion to correlated strategies appeared in several works |[6] ITS] j27]] . Conitzer 
and Korzhyk @ analyzed correlated strategies in one-shot games providing a single linear program for 
their computation. Letchford et al. IT8l showed that the problem of finding optimal correlated strategies to 
commit to is NP-hard in infinite discounted stochastic game^J Xu et al. lITTIl focused on using correlated 
strategies in a real-world security based scenario. 

The detailed analysis of the impact when the leader can commit to correlated strategies has, however, 
not been investigated sufficiently in the existing work. We address this extension and study the complex¬ 
ity for multiple subclasses of extensive-form games. Our results show that for many cases the problem of 
computing Stackelberg equilibria in correlated strategies is polynomial compared to the NP-hardness in 
behavioral strategies. Finally, these theoretical results have also practical algorithmic implications. An algo¬ 
rithm that computes a Stackelberg equilibrium in correlated strategies can be used to compute a Stackelberg 
equilibrium in behavioral strategies allowing a significant speed-up in computation time 0. 

2 Preliminaries 

We consider finite two-player sequential games. Note that for every finite set K, A (K) denotes probability 
distributions over K and V(K) denotes the set of all subsets of K. 

Definition 1 (2-player sequential game) A two-player sequential game is given by a tuple G = (Af, S,Z, 
p, A, u, T, C), where: 

• Af = {1,2} is a set of two players; 

• S is a set of non-terminal states; 

• Z is a set of terminal states; 

• p : S —>• V (Af ) U {c} is a function that defines which player(s) act in a given state, or whether the 
node is a chance node (case in which p(s) = c); 

• A is a set of actions; we overload the notation to restrict the actions only for a single player as A, 
and for a single state as A(s); 

• T : S x Ai —> {SL)Z} is a transition function between states depending on the actions taken 

by all the players that act in this state. Overloading notation, T(s) also denotes the children of a state 
s: T(s) = {s' G 5 U 2 3a £ Al(s); T(s, a) = s'}; 

2 More precisely, that work assumes that the correlated strategies can use a finite history. 
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• C, : A r -A [0.1] are the chance probabilities on the edges outgoing from each chance node s E S, 
such that JfaeAds) C ( a ) = l '< 

• Finally, m : Z -A M is the utility function for player i E A f. 

In this paper we study Stackelberg equilibria, thus player 1 will be referred to as the leader and player 2 
as the follower. 

We say that a game is turn-based if there is a unique player acting in each state (formally, \p(s)\ = 
1 Vs E S) and with concurrent moves if both players can act simultaneously in some state. Moreover, the 
game is said to have no chance if there exist no chance nodes; otherwise the game is with chance. 

A pure strategy tt, e If, of a player i E J\f is an assignment of an action to play in each state of the 
game (7iy : S —> Ai). A behavioral strategy ay E Xy is a probability distribution over actions in each state 
ay : A —> [0, 1] such that Vs E 5, Vi E p(s) J2a<zAi(s) (T *( a ) = 1- 

The expected utility of player i given a pair of strategies (ay, ct 2 ) is defined as follows: 

Ui((Ji,a 2 ) = Ui(z)p a (z), 

where p a {z) denotes the probability that leaf z will be reached if both players follow the strategy from a 
and due to stochastic transitions corresponding to C. 

A strategy ay of player i is said to represent a best response to the opponent’s strategy a- t if Uj(oy, a-i) > 
Ui(cTj, (7-i) Vct' E Ej. Denote by£>7v!.(o-„ i) C Ilj the set of all the pure best responses of player i to strategy 
(j- % . We can now introduce formally the Stackelberg Equilibrium solution concept: 

Definition 2 (Stackelberg Equilibrium) A strategy profile cr = (ai,a 2 ) is a Stackelberg Equilibrium if 
is an optimal strategy of the leader given that the follower best-responds to its choice. Formally, a 
Stackelberg equilibrium in pure strategies is defined as 

{cr i, o r 2 ) = argmax u\(a [, a' 2 ) 

CT(eni,(r'eB7eK) 

while a Stackelberg equilibrium in behavioral strategies is defined as 

{a l ,a 2 )= argmax 

<eSi,(r'eB7eK) 

Next, we describe the notion of a Stackelberg equilibrium where the leader can commit to a correlated 
strategy in a sequential game. The concept was suggested and investigated by Letchford et al. lfl~8ll . but no 
formal definition exists. Formalizing such a definition below, we observe that the definition is essentially the 
“Stackelberg analogue” of the notion of Extensive-Form Correlated Equilibria (EFCE), introduced by von 
Stengel and Forges If26l . This parallel turns out to be technically relevant as well. 

Definition 3 (Stackelberg Extensive-Form Correlated Equilibrium) A probability distribution cf> on pure 
strategy profiles II is called a Stackelberg Extensive-Form Correlated Equilibrium (SEFCE) if it maximizes 
the leader’s utility (that is, f = arg max^/g^jp u x{4>')) subject to the constraint that whenever the play 
reaches a state s where the follower can act, the follower is recommended an action a according to <j> such 
that the follower cannot gain by unilaterally deviating from a in state s (and possibly in all succeeding 
states), given the posterior on the probability distribution of the strategy of the leader, defined by the actions 
taken by the leader so far. 
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Figure 1: (Left) An example game with different outcomes depending on whether the leader commits to 
behavioral or to correlated strategies. The leader acts in nodes s 3 and S 4 , the follower acts in nodes si and 
S 2 - Utility values are shown in the terminal states, first value is the utility for the leader, second value is the 
utility of the follower. (Right) A visualization of the outcomes of the example game in the two-dimensional 
utility space of the players - the horizontal axis corresponds to the utility of the follower, the vertical axis 
corresponds to the utility of the leader. Red vertical lines visualize the minimal value the follower can 
guarantee in S 2 or si respectively, blue lines and points correspond to new outcomes that can be achieved if 
the leader commits to correlated strategies 

We give an example to illustrate both variants of the Stackelberg solution concept. 

Example 1 Consider the game in Figure [7] where the follower moves first (in states s 1 , $ 2 ) and the leader 
second (in states s 3, S4). By committing to a behavioral strategy, the leader can gain utility 1 in the optimal 
case - leader commits to play left in state s 3 and right in S 4 . The follower will then prefer playing right in 
S 2 and left in si, reaching the leaf with utilities (1, 3). Note that the leader cannot gain more by committing 
to strictly mixed behavioral strategies. 

Now, consider the case when the leader commits to correlated strategies. We interpret the probability 
distribution over strategy profiles <p as signals send to the follower in each node where the follower acts, 
while the leader is committing to play with respect to (i> and the signals sent to the follower. This can be 
shown in node .s 2 , where the leader sends one of two signals to the follower, each with probability 0.5. In 
the first case, the follower receives the signal to move left, while the leader commits to play uniform strategy 
in S 3 and action left in S 4 reaching the utility value (2,1) if the follower plays according to the signal. In 
the second case, the follower receives the signal to move right, while the leader commits to play right in 
S4 and left in s 3 reaching the utility value (1,3) if the follower plays according to the signal. By using this 
correlation, the leader is able to get the utility of 1.5, while ensuring the utility of 2 for the follower; hence, 
the follower will follow the only recommendation in node si to play left. 

The situation can be visualized using a two-dimensional space, where the x-axis represents the utility 
of the follower and the y-axis represents the utility of the leader. This type of visualization was also used in 
sm and we use it further in the proof of Theorem^ While the black nodes correspond to the utility points 
of the leafs, the solid black lines correspond to outcomes when the leader randomize between the leafs. The 
follower plays a best-response action in each node; hence, in order to force the follower to play action left 
in S 2 , the leader must guarantee the follower the utility of at least 1 in the sub-game rooted in node S 3 
since the follower can get at least this value by playing right in S 2 ■ Therefore, each state of the follower 
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restricts the set of possible outcomes of the game. These restrictions are visualized as the vertical dashed 
lines - one corresponds to the described situation in node S 2 , and the second one due to the leaf following 
node s i. Considering only commitments to behavioral strategies, the best of all possible outcomes for the 
leader is the point (v ,2 = 3, u\ = 1). With correlation, however, the leader can achieve a mixture of points 
{u ‘2 = l,iii = 2) and ( U 2 = 3, u\ = 1) (the blue dashed line). This can also be interpreted as forming 
a convex hull over all possible outcomes in the sub-tree rooted in node S 2 - Note, that without correlation, 
the set of all possible outcomes is not generally a convex set. Finally, after restricting this set of possible 
solutions due to leaf in node si, the intersection point ( U 2 =2,u\ = 1.5) represents the expected utility for 
the Stackelberg Extensive-Form Correlated Equilibrium solution concept. 

The example gives an intuition about the structure of the probability distribution 0 in SEFCE. In each 
state of the follower, the leader sends a signal to the follower and commits to follow the correlated strategy if 
the follower admits the recommendation, while simultaneously committing to punish the follower for each 
deviation. This punishment is simply a strategy that minimizes the follower’s utility and will be useful in 
many proofs; next we introduce some notation for it. 

Let o m denote a behavioral strategy profile, where in each sub-game the leader plays a minmax behavior 
strategy based on the utilities of the follower and the follower plays a best response. Moreover, for each state 
s € S, we denote by p(s) the expected utility of the follower in the sub-game rooted in state s if both players 
play according to a m (i.e., the value of the corresponding zero-sum sub-game defined by the utilities of the 
follower). 

Note that being a probability distribution over pure strategy profiles, a SEFCE is, a priori, an object 
of exponential size in the size of the description of the game, when it is described as a tree. This has to 
be dealt with before we can consider computing it. The following lemma gives a compact representation 
of the correlated strategies in a SEFCE and the proof yields an algorithm for constructing the probability 
distribution 0 from the compact representation. It is this compact representation that we seek to compute. 

Lemma 1 For any turn-based or concurrent-move game in tree form, there exists a SEFCE 0 £ A (II) that 
can be compactly represented as a behavioral strategy profile o = (o\, 02 ) such thatAz £ Z p^(z) = p a (z) 
and (i> corresponds to the following behavior: 

• the follower receives signals in each state s according to 02 (a) for each action a £ A'))*) 

• the leader chooses the action in each state s according to o\(a) for each action a £ A \ ($) if the state 
s was reached by following the recommendations 

• both players switch to the minmax strategy o' n after a deviation by the follower. 

Proof: Let A be a SEFCE. We construct the behavioral strategy profile o from A and then show how an 
optimal strategy <j) can be constructed from o and o'" . 

To construct 0 , it is sufficient to specify a probability o(a) for each action a £ .A(s) in each state s. 
We use the probability of state s being reached (denoted A (A) that corresponds to the sum of pure strategy 
profiles A A) such that the actions in strategy profile ir allow state s to be reached. 

Formally, there exists a sequence so, ao, • ■ ■, a/,-- 1 , 67 . of states and actions (starting at the root), such 
that for every j = 0,.... A: — 1 it holds that a :j = Tt(sf), Sj +1 = T ( Sj , Uj ) (or .sy + i is the next decision node 
of some player if T(sj,aj) is a chance node), so = s r 0 ot, and sy- = s. Let Il(.s) denote a set of pure strategy 
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profiles for which such a sequence exists for state s, and n(s, a) C I I(s) the strategy profiles that not only 
reach s, but also prescribe action a to be played in state s. We have: 

, X^'en^a) ^ ( 7r ) . If/ \ Jjf f\ 

<r(a) = - — -, where f (s) = ^ ) 

^ ^ ' Tr'en(s) 

In case ^'(s) = 0, we set the behavior strategy in a arbitrarily. 

Next, we construct a strategy f that corresponds to the desired behavior and show that it is indeed 
an optimal SEFCE strategy. We need to specify a probability for every pure strategy profile ir = (7Ti, 77 ). 
Consider the sequence of states and actions that corresponds to executing the actions from the strategy profile 
7 r. Let s l 0 , a l 0 ,..a\. 1 , s/ Cl be one of q possible sequences of states and actions (there can be multiple such 
sequences due to chance nodes), such that j = 0 ,..., ki — 1 , a!j = tt(s 1 j), s *- +1 = ’T{s l ] . a'j) (or s l ]+{ is one 
of the next decision nodes of some player immediately following the chance node(s) T ( s l j,a l j )), Sq = s roo t, 
and s l ki € Z. The probability for the strategy profile 7r corresponds to the probability of executing the 
sequences of actions multiplied by the probability that the remaining actions prescribe minmax strategy a m 
in case the follower deviates: 

/ <? fci-i 

</>(*) = n n a ^ 

\i= 1 3 =0 

Correctness By construction of a and <f>, it holds that probability distribution over leafs remains the same 
as in 4>'-, hence, \/z € Z p 0 i(z) = p a (z) = p 0 (z) and thus the expected utility of <i> for the players is the 
same as in <fi'. 

Second, we have to show that the follower has no incentive to deviate from the recommendations in (l>. 
By deviating to some action a' in state s, the follower gains /i(T ( s , a')), since both players play according to 
a m after a deviation. In <p', the follower can get for the same deviation at best some utility value V 2 (T(s, a')), 
which by the definition of the minmax strategies a m is greater or equal than p(T(s, a')). Since the expected 
utility of the follower for following the recommendations is the same in 6 as in (j) 1 , and the follower has no 
incentive to deviate in cj)' because of the optimality, the follower has no incentive to deviate in <p either. □ 

3 Computing Exact Strategies in Turn-Based Games 

We start our computational investigation with turn-based games. 

Theorem 1 There is an algorithm that takes as input a turn-based, game in DAG form with no chance nodes 
and outputs a Stackelberg equilibrium in pure strategies. The algorithm runs in time 0(|<S|(|<S| + \Z\)). 

Proof: Our algorithm performs three passes through all the nodes in the graph. 

First, the algorithm computes the minmax values p(s) of the follower for each node in the game by 
backward induction. 

Second, the algorithm computes a capacity for each state in order to determine which states of the game 
arc reachable (i.e., there exists a commitment of the leader and a best response of the follower such that the 
state can be reached by following their strategies). The capacity of state s, denoted 7 (s), is defined as the 
minimum utility of the follower that needs to be guaranteed by the outcome of the sub-game starting in state s 
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a m (a). 


a , =7r(s , )|s , e5\{sj,...,sj 0 _ 1 ,sg,...,s| 9 _i} 



in order to make this state reachable. By convention 7 (s roo t) = —00 and we initially set z(SUZ\{s roo / : j) = 
00 and mark them as open. 

Third, the algorithm evaluates each open state s, for which all parents have been marked as closed. We 
distinguish whether the leader, or the follower makes the decision: 

• s is a leader node : the algorithm sets 'j(s') = mines'), 7 (s)) for all children s' E T(s); 

• s is a follower node', the algorithm sets 7 (s') = min( 7 (s'), max( 7 (s), max s // e 7 -( s )\r s n /r(s"))) for 
all children s' E T(s). 

Finally, we mark state s as closed. 

We say that leaf z E Z is a possible outcome , if p(z) = U 2 {z) > 7 (z). Now, the solution is such 
a possible outcome that maximizes the utility of the leader, i.e. arg max 262 U2 ( z )> 7 ( 2 ) «i (z). The strategy 
is now constructed by following nodes from leaf 2 back to the root while using nodes s' with capacities 
7 (s') < p(z). Due to the construction of capacities, such a path exists and forms a part of the Stackelberg 
strategy. The leader commits to the strategy leading to max min utility for the follower in the remaining 
states that are not part of this path. 

Complexity Analysis Computing the max min values can be done in O (|S | (| S | + \Z\)) by backward in¬ 
duction due to the fact the graph is a DAG. In the second pass, the algorithm solves the widest-path problem 
from a single source to all leafs. In each node, the algorithm calculates capacities for every child. In nodes 
where the leader acts, there is a constant-time operation performed for each child. Flowever, we need to be 
more careful in nodes where the follower acts. For each child s' E T(s) the algorithm computes a maximum 
value p(s') of all of the siblings. We can do this efficiently by computing two maximal values of //(.s') for all 
s' E T(s) (say s 1 , s 2 ) and for each child then the term max s i/ e 7 -( s wr s n /x(s") equals either to s 1 if s' f s 1 , 
or to s 2 if s' = s 1 . Therefore, the second pass can again be done in 0(|<S|(|<S| + \Z\)). Finally, finding 
the optimal outcome and constructing the optimal strategy is again at most linear in the size of the graph. 
Therefore the algorithm takes at most 0(|<S|(|<S| + \Z\)) steps. □ 

Next we provide an algorithm for computing a Stackelberg extensive-form correlated equilibrium for 
turn-based games with no chance nodes. 

Theorem 2 There is an algorithm that takes as input a turn-based game in tree form with no chance nodes 
and outputs an SEFCE in the compact representation. The algorithm runs in time 0(|<S| \Z\). 

Proof: We improve the algorithm from the proof of Theorem 4 in ifTTl . The algorithm contains two steps: 
( 1 ) a bottom-up dynamic program that for each node s computes the set of possible outcomes, ( 2 ) a down¬ 
ward pass constructing the optimal correlated strategy in the compact representation. 

For each node s we keep set of points H s in two-dimensional space, where the x-dimension represents 
the utility of the follower and the //-dimension represents the utility of the leader. These points define the 
convex set of all possible outcomes of the sub-game rooted in node s (we assume that H s contains only the 
points on the boundary of the convex hull). We keep each set H s sorted by polar angle. 

Upward pass In leaf z £ Z, we set H z = {z}. In nodes s where the leader acts, the set of points H s is 
equal to the convex hull of the corresponding sets of the children H w . That is, H s = Conv(U. ( ,,c 77 . S ) H in ). 

In nodes s where the follower acts, the algorithm performs two steps. First, the algorithm removes from 
each set H w of child w the outcomes from which the follower has an incentive to deviate. To do this, the 
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algorithm uses the maxmin U 2 values of all other children of s except w and creates a new set H w that we 
call the restricted set. The restricted set H w is defined as an intersection of the convex set representing all 
possible outcomes H w and all outcomes defined by the halfspace restricting the utility x of the follower by 
the inequality: 

x > max min U 2 ^p )• 
w'&T{s)-,w'^w p'GH w i 

Second, the algorithm computes the set H s by creating a convex hull of the corresponding restricted sets 
H w of the children w. That is, H s = Conv(\J we 'j-(s)H w ). 

Finally, in the root of the game tree, the outcome of the Stackelberg Extensive-Form Correlated Equilib¬ 
rium is the point with maximal payoff of player 1 : pse = arg max pgffs ^ u i (p). 

Downward pass We now construct the compact representation of commitment to correlated strategies that 
ensures the outcome pse calculated in the upward pass. The method for determining the optimal strategy in 
each node is similar to the method Strategy!*, p") used in the proof of Theorem 4 in itTTl . 

Given a node s and a point p" that lies on the boundary of H s , this method specifies how to commit 
to correlated strategies in the sub-tree rooted in node s. Moreover, the proof in ifTTl also showed that it is 
sufficient to consider mixtures of at most two actions in each node and allowing correlated strategies does 
violate their proof. We consider separately leader and follower nodes: 

• For each node s where the leader acts, the algorithm needs to find two points p,p' in the boundaries 
of children H w and H w >, such that the desired point p" is a convex combination of p E H w and p' E H w i. 
If iv = w', then the strategy in node s is to commit to pure strategy leading to node w. If w / w', then 
the strategy to commit to in node s is a mixture: with probability a to play action leading to w and with 
probability (1 — a) to play action leading to w', where a E [0,1] is such that p" = ap + (1 — a)p'. Finally, 
for every child s' E T(s) we call the method strategy with appropriate p (or p') in case s' = w (or w'), and 
with the threat value corresponding to p(s') for every other child. 

• For each node s where the follower acts, the algorithm again needs to find two points p,p' in the 
restricted boundaries of children H w and H u; > , such that the desired point p" is a convex combination of 
p E H w and p' E H w i. The reason for using the restricted sets is because the follower must not have an 
incentive to deviate from the recommendation. 

Similarly to the previous case, if w = w', then the correlated strategy in node s is to send the follower 
signal leading to node w while committing further to play strategy {w,p) in sub-tree rooted in node w, and 
to play the minmax strategy in every other child s' corresponding to value n(s’). 

If w ^ w', then there is a mixture of possible signals: with probability a the follower receives a signal 
to play the action leading to w and with probability (1 — a) signal to play the action leading to w', where 
a E [0,1] is again such that p" = ap + (1 — a)p'. As before, by sending the signal to play certain action, 
the leader commits to play method strategy (w,p) (or strategy (w',p')) in sub-tree rooted in node w (or w') 
and committing to play the minmax strategy leading to value p(s') for every other child s'. 

Correctness Due to the construction of the set of points H s that are maintained for each node s, these 
points correspond to the convex hull of all possible outcomes in the sub-game rooted in node s. In leafs, 
the algorithm adds the point corresponding to the leaf. In the leader’s nodes, the algorithm creates a convex 
combinations of all possible outcomes in the children of the node. The only places where the algorithm 
removes some outcomes from these sets are nodes of the follower. If a point is removed from H w in node 
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s, there exists an action of the follower in s that guarantees the follower a strictly better expected payoff 
than the expected payoff of the outcome that correspond to the removed point. Therefore, such an outcome 
is not possible as the follower will have an incentive to deviate. The outcome selected in the root node is 
the possible outcome that maximizes the payoff of the leader of all possible outcomes; hence, it is optimal 
for the leader. Finally, the downward pass constructs the compact representation of the optimal correlated 
strategy to commit to that reaches the optimal outcome. 

Complexity Analysis Computing boundary of the convex hull H s takes ()(\Z\) time in each level of the 
game tree since the children sets H w are already sorted IflOl p. 6]. Moreover, since we keep only nodes on 
the boundary of the convex hull, the inequality YlseS 1-^1 < \^\ f° r nodes in a single level of the game 
tree also bounds the number of lines that need to be checked in the downward pass. Therefore, each pass 
takes at most 0(|5| \Z\ ) time. □ 

Interestingly, the algorithm described in the proof of Theorem [2] can be modified also in cases where 
the game contains chance, as shown in the next theorem. This is in contrast to computing a Stackelberg 
equilibria that is NP-hard with chance. 

Theorem 3 There is an algorithm that takes as input a turn-based game in tree form with chance nodes and 
outputs the compact form of an SEFCEfor the game. The algorithm runs in time O (| tS \Z\). 

Proof: We can use the proof from Theorem [2] but need to analyze what happens in chance nodes in the 
upward pass. The algorithm computes in chance nodes the Minkowski sum of all convex sets in child nodes 
and since all sets are sorted and this is a planar case, this operation can be again performed in linear time IflOl 
p. 279]. The size of set H s is again bounded by the number of all leafs lfT2ll . □ 

4 Computing Exact Strategies in Concurrent-Move Games 

Next we analyze concurrent-move games and show that while the problem of computing a Stackelberg 
equilibrium in behavior strategics is NP-hard (even without chance nodes), the problem of computing a 
Stackelberg extensive-form correlated equilibrium can be solved in polynomial time. 

Theorem 4 Given a concurrent-move games in tree form with no chance nodes and a number a, it is NP- 
hard to decide if the leader ach ieves payoff at least a in a Stackelberg equilibrium in behavior strategies. 

The proof for the above hardness result above is included in the appendix Section |7.1[ the proof uses a 
reduction from the NP-complete problem KNAPSACK. 

Theorem 5 For a concurrent-move games in tree form, the compact form of an SEFCEfor the game can be 
found in polynomial time by solving a single linear program. 

Proof: We construct a linear program (LP) based on the LP for computing Extensive-Form Correlated 
Equilibria (EFCE) l26l . We use the compact representation of SEFCE strategies (described by Lemma [TJ 
represented by variables 5(s) that denote a joint probability that state s is reached when both players, and 
chance, play according to SEFCE strategies. 
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The size of the original EFCE LP—both the number of variables and constraints—is quadratic in the 
number of sequences of players. However, the LP for EFCE is defined for a more general class of imperfect- 
information games without chance. In our case, we can exploit the specific structure of a concurrent-move 
game and together with the Stackelberg assumption reduce the number of constraints and variables. 

First, the deviation from a recommended strategy causes the game to reach a different sub-game in 
which the strategy of the leader can be chosen (almost) independently to the sub-game that follows the 
recommendation. 

Second, the strategy that the leader should play according to the deviations is a minmax strategy, with 
which the leader punishes the follower by minimizing the utility of the follower as much as possible. Thus, 
by deviating to action o' in state s, the follower can get at best the minmax value of the sub-game starting 
in node T(s,a') that we denote as n(T(s,a')). The values p(s) for each state s G S can be computed 
beforehand using backward induction. 

The linear program is as follows. 




max \ S(z)ui(z) 

15,v 2 z t 

(1) 

subject to: S(s root ) 


zT^Z, 

1 


(2) 

0 > S(s) 

> 

1 

Vs G S 

(3) 

S(s) 

= 

E w 

Vs G S; p(s ) = {1,2} 

(4) 

S(T{s,a c )) 


s'eT(s) 

S(s)C(s, a c ) 

Vs G S Va € .4 c (s); p(s) = {c} 

(5) 

V 2 (z) 

= 

u 2 (z) 5 (z) 

VzGZ 

(6) 

V 2 {s) 

= 

y V 2 (s') 

Vs G S 

(7) 

Y, v 2 (T(s,ai x a 2 )) 

> 

s'er(s) 

£ W 5 ’ 

ai x a 2 ))p(T{s,ai x a' 2 )) 


QlG-4i(s) 


aiGXti(s) 

Vs G S Vg& 2 , n 2 G *A 2 (s') 

(8) 


The interpretation is as follows. Variables 5 represent the compact form of the correlated strategies. 

Equation ([2]) ensures that the probability of reaching the root state is 1, while Equation Q ensures that 
for each state s, we have S(s) between 0 and 1. 

Network-flow constraints: the probability of reaching a state equals the sum of probabilities of reaching 
all possible children (Equation (|4])) and it must correspond with the probability of actions in chance nodes 
(Equation Q). The objective function ensures that the LP finds a correlated strategy that maximizes the 
leader’s utility. 

The follower has no incentive to deviate from the recommendations given by 5: To this end, variables 
V 2 (s) represent the expected payoff for the follower in a sub-game rooted in node s G S when played ac¬ 
cording to <5; defined by Equations ([6]j7]). Each action that is recommended by 6 must guarantee the follower 
at least the utility she gets by deviating from the recommendation. This is ensured by Equation ([8]), where 
the expected utility for recommended action a 2 is expressed by the left side of the constraint, while the 
expected utility for deviating is expressed by the right side of the constraint. 

Note that the expected utility on the right hand side of Equation ([8]) is calculated by considering the 
posterior probability after receiving the recommendation a -2 and the minmax values of children states after 


12 


playing a' 2 ; n(T(s, cq x a' 2 )). 

Therefore, the variables <5 found by solving this linear program correspond to the compact representation 
of the optimal SEFCE strategy. □ 

5 Approximating Optimal Strategies 

In this section, we describe fully polynomial time approximation schemes for finding a Stackelberg equilib¬ 
rium in behavioral strategies as well as in pure strategies for turn based games on trees with chance nodes. 

We start with the problem computing behavioral strategies for turn-based games on trees with chance 
nodes. 

Theorem 6 There is an algorithm that takes as input a turn-based game on a tree with chance nodes 
and a parameter e, and computes a behavioral strategy for the leader. That strategy, combined with some 
best response of the follower, achieves a payoff that differs by at most e from the payoff of the leader in 
a Stackelberg equilibrium in behavioral strategies. The algorithm runs in time 0(e ' i (U H'l f'T), where 
U = ma y^ a ,a’ ui(ct) ~ ui (<t'), T is the size of the game tree and Ht is its height. 

Proof: The exact version of this problem was shown to be NP-hard by Letchford and Conitzer lUTl . Their 
hardness proof was a reduction from Knapsack and our algorithm is closely related to the classical ap¬ 
proximation scheme for this problem. We present here the algorithm, and delegate the proof of correctness 
to the appendix. 

Our scheme uses dynamic programming to construct a table of values for each node in the tree. Each 
table contains a discretized representation of the possible tradeoffs between the utility that the leader can 
get and the utility that can at the same time be offered to the follower. In the appendix, we show that the 
cumulative error in the leaders utility is bounded additively by the height of the tree. This error only depends 
on the height of the tree and not the utility. By an initial scaling of the leader utility by a factor D, the error 
can be made arbitrarily small, at the cost of extra computation time. This scaling is equivalent to discretizing 
the leaders payoff to multiples of some small 5 = 1/D. For simplicity, we only describe the scheme for 
binary trees, since nodes with higher branching factor can be replaced by small equivalent binary trees. 

An important property is that only the leader’s utility is discretized, since we need to be able to reason 
correctly about the follower’s actions. The tables are indexed by the leader’s utility and contains values that 
are the follower’s utility. More formally, for each sub-tree T we will compute a table At with the following 
guarantee for each index k in each table: 

a) the leader has a strategy for the game tree T that offers the follower utility At [k] while securing utility 
at least k to the leader. 

b) no strategy of the leader can (starting from sub-tree T) offer the follower utility strictly more than 
At [A;], while securing utility at least k + Ht to the leader, where Ht is the height of the tree T. 

This also serves as our induction hypothesis for proving correctness. For commitment to pure strategies, a 
similar table is used with the same guarantee, except quantifying over pure strategies instead. 

We will now examine each type of node, and for each show how the table is constructed. For each node 
T, we let L and R denote the two successors (if any), and we let At, At, and An denote their respective 
tables. Each table will have n = H-[ U/e entries. 
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If T is a leaf with utility (i*i, 1 * 2 ), the table can be filled directly from the definition: 

A T m^( a2 

[ —00 , otherwise 

Both pai'ts of the induction hypothesis arc trivially satisfied by this. 

If T is a leader node, and the leader plays L with probability p, followed up by the strategies that gave 
the guarantees for A R [i] and A R [j], then the leader would get an expected pi + (1 — p)j, while being able 
to offer pA R [i] + (1 — p)A R \j] to the follower. For a given k, the optimal combination of the computed 
tradeoffs becomes: := maxj j tP {pAL [i] + (1 — p)A R [j] \ pi + (1 — p)j > k}. This table can be 

computed in time 0(n 3 ) by looping over all 0 < i,j,k < n, and taking the maximum with the extremal 
feasible values of p. 

If T is a chance node, where the probability of L is p, and the leader combines the strategies that gave 
the guarantees for Al\i\ and .4 R [j ], then the leader would get an expected pi + (1 — p)j while being able 
to offer pAl[i\ + (1 — p)A R [j] to the follower. For a given k, the optimal combination of the computed 
tradeoffs becomes: Aj^/c] := ma Xij{pAi[i\ + (1 — p)A R [j] \ pi + (1 — p)j > k}. The table At can 
thus be filled in time 0 (n 3 ) by looping over all 0 < i,j,k < n, and this can even be improved to 0(n 2 ) by 
a simple optimization. 

If T is a follower node, then if the leader combines the strategy for .4 j j [i] in L with the minmax strategy 
for R, then the followers best response is L iff .4 [i] > p(R), and similarly it is R if .4 R [j] > //(/.). Thus, 
the optimal combination becomes 

A T [k\ := ma x(A L [k\ i^ R) ,A R [k] | m(l) ) ^ {.^^L^ise 

The table At can be filled in time O(n). 

Putting it all together, each table can be computed in time 0(n 3 ), and there is one table for each 
node in the tree, which gives the desired running time. Let At be the table for the root node, and let 
i! = max{i | At [i] > — 00 }. The strategy associated with ,4-/ i r guarantees utility that is at most Ht 
from the best possible guarantee in the scaled game, and therefore at most e from the best possible guarantee 
in the original game. 

This completes the proof of the theorem. □ 

Next, we prove the analogous statement for the case of pure strategies. Again, the exact problem was 
shown to be NP-hard by Conitzer and Letchford. 

Theorem 7 There is an algorithm that takes as input a turn-based game on a tree with chance nodes and a 
parameter e, and computes a pure strategy for the leader. That strategy, combined with some best response 
of the follower, achieves a payoff that differs by at most e from the payoff of the leader in a Stackelberg 
equilibrium in pure strategies. The algorithm runs in time 0(e~ 2 (UHt) 2 T), where U = max CTi 0 ./ 1 * 1 ( 0 ") — 
1*1 (a'), T is the size of the game tree and Ht is its height. 
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Proof: The algorithm is essentially the same as the one for behavioral strategies, except that leader nodes 
only have p 6 {0,1}. The induction hypothesis is similar, except the quantifications are over pure strategies 
instead. For a given k, the optimal combination of the computed tradeoffs becomes: 

Ax[k] := max{A c [i] | i > k A c G {L, i?}}. 

The table At can be computed in time O(n). 

The performance of the algorithm is slightly better than in the behavioral case, since the most expensive 
type of node in the behavioral case can now be handled in linear time. Thus, computing each table now takes 
at most 0(n 2 ) time, which gives the desired running time. □ 

6 Discussion 

Our paper settles several open questions in the problem of complexity of computing a Stackelberg equilib¬ 
rium in finite sequential games. Very often the problem is NP-hard for many subclasses of extensive-form 
games and we show that the hardness holds also for games in the tree form with concurrent moves. However, 
there are important subclasses that admit either an efficient polynomial algorithm, or fully polynomial-time 
approximation schemes (FPTAS); we provide an FPTAS for games on trees with chance. The question unan¬ 
swered within the scope of the paper is whether there exists a (fully) polynomial-time approximation scheme 
for games in the tree form with concurrent moves. Our conjecture is that the answer is negative. 

Second, we formalize a Stackelberg variant of the Extensive-Form Correlated Equilibrium solution con¬ 
cept (SEFCE) where the leader commits to correlated strategies. We show that the complexity of the problem 
is often reduced (to polynomial) compared to NP-hardness when the leader commits to behavioral strategies. 
However, this does not hold in general, which is showed by our hardness result for games on DAGs. 

Our paper does not address many other variants of computing a Stackelberg equilibrium where the leader 
commits to correlated strategies. First of all, we consider only two-player games with one leader and one 
follower. Even though computing an Extensive-Form Correlated Equilibrium in games with multiple players 
is solvable in polynomial time, a recent result showed that computing a SEFCE on trees with no chance with 
3 or more players is NP-hard (4J. Second, we consider only behavioral strategies (or memoryless strategies) 
in games on DAGs. Extending the concept of SEFCE to strategies that can use some fixed-size memory is a 
natural continuation of the present work. 
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7 Appendix: Hardness Results 

In this section we provide the missing proof of NP-hardness. 

7.1 Computing Exact Strategies in Concurrent-Move Games 

For the analysis in this section we use a variant of the NP-complete problem Knapsack, which we call 
Knapsack with unit-items: 

KNAPSACK WITH UNIT-ITEMS:G/ve« N items with positive integer weights w wn and 
values v\,... ,vn, a weight budget W, and a target value K, and such that at least W of 
the items have weight and value 1, does there exist J E V(N) such that ./ w i — W and 

i<? 

The following lemma will be useful. 

Lemma 2 The Knapsack with unit-items problem is NP-complete. 

Proof: We can reduce from the ordinary Knapsack problem. So given N items with weights w \,..., wn 
and values v\,, vn, and weight budget W and target K, we form N + W items. The weight and values 
of the first N items are given by W{ and ( W + 1 ) ■(;,, for i = 1,..., N. The next W items are given weight 
and value 1. The weight budget is unchanged W, but the new target value is (W + 1) K. □ 

We can now prove the main result of this section. 

ThE0REm[4]( restated). Given a concurrent-move games in tree form with no chance nodes and a number 
a, it is NP-hard to decide if the leader achieves payoff at least a in a Stackelberg equilibrium in behavior 
strategies. 

Proof: Consider an instance of Knapsack with unit-items. We define a concurrent-move extensive- 
form game in a way so that the optimal utility attainable by the leader is equal to the optimal solution value 
of the Knapsack with unit-items instance. 

The game tree consists of two levels (see Figure [2])— the root node consisting of N actions of the leader 
and N + 1 actions of the follower. M denotes a large constant that we use to force the leader to select a 
uniform strategy in the root node. More precisely, we choose M as the smallest integer such that M > 
WNvi and M > Nwi for i = 1...., N. In the second level, there is a state I, corresponding to item i that 
models the decision of the leader to include items in the subset (action 0), or not (action ©). 

Consider a feasible solution J to the KNAPSACK WITH UNIT-ITEMS problem with unit-items. This 
translates into a strategy for the leader as follows. In the root node she plays the uniform strategy, and in 
sub-game 2* plays 0 with probability 1 if i 6 J and plays © with probability 1 otherwise. We can now 
observe that the follower plays L in sub-games 2 \ where i E ,7, since ties are broken in favor of the leader, 
and the follower plays R in sub-games 2, where i f J. In the root node, action /o for the follower thus leads 
to payoff — Yliej w i — —W. Actions /*. for k > 1 leads to payoff 

1 N - 1 

— {NM -W-M) + —(-VF -M) = -W. 

Since ties are broken in favor of the leader, the follower plays action /o, which means that the leader receives 
payoff Yli&j which is the value of the KNAPSACK WITH UNIT-ITEMS solution. 
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Figure 2: Game tree for reduction in proof of Theorem [4j 


Consider on the other hand an optimal strategy for the leader. By the structure of the game we have the 
following lemma. 

Claim 1 Without loss of generality the leader plays using a pure strategy in each sub-game I,. 

Proof: If in sub-game T the leader commits to playing © with probability 1, the follower will choose to 
play L due to ties being broken in favor of the leader. If on the other hand the leader plays © with probability 
strictly lower than 1, the follower will choose to play R, leading to utility 0 for the leader, and at most 0 for 
the follower. Since the leader can only obtain positive utility if the follower plays action /o in the root node, 
there is thus no benefit for the leader in decreasing the utility for the follower by committing to a strictly 
mixed strategy. In other words, if the leader plays © with probability strictly lower than 1, the leader might 
as well play © with probability 0. □ 

Thus from now on we assume that the leader plays using a pure strategy in each sub-game T t . Let 
J G 'P(N) be such the set of indices i of the sub-games X, where the leader commits to action ©. 

Claim 2 If the strategy of the leader ensures positive utility it chooses an action uniformly at random in the 
root node. 


Proof: Let e * G [ N , 1 — jj] be such that the leader commits to playing action l t with probability + e*. 
Then if the follower plays action /o, the leader obtains payoff 

^2 Vi + N ^2 £ i y i 
i£j i£j 


and the follower obtains payoff 

~^2 wi ~ Ny^ejWj. 

i&J i€J 

If the follower plays action /*., for k > 1, the leader obtains payoff 0 and the follower obtains payoff 

e k NM - W. 

Let k be such that e k = max* £$, and assume to the contrary that e k > 0. Note that 

1 \ ^ 1 \ ^ 

£k — iv 2^ £i — 2^ £i ■ 

i-.£i> 0 v.£i< 0 


19 

















We now proceed by case analysis. 


Case 1 u ’i > W) : By definition of e k and M we have 

e k M > I -jj £i ) M > £ i( Nw i) 

y ieJ:£i< 0 J i£j:£i< 0 

= - ^ £iWi > ~ ^2 £iWi 

i&J-.£i< 0 i£j 

Multiplying both sides of the inequality by N and using the inequality: —W > — YlieJ w 'i- we have 

e k NM - W > -^Wi - N^2 £ i w i > 
iSJ i£J 

which means that action f k is preferred by the follower. Thus the leader receives payoff 0. 

Case 2 Wi <W): Since we have a KNAPSACK with unit-items instance, there is a knapsack solu¬ 

tion that obtains value 1 + v i> which corresponds to a strategy for the leader that obtains the same util¬ 
ity. Since the current strategy is optimal for the leader we must have Yliej V i + N Yliej £ i v i > 1 + Sie j, 
which means that 1 < N £ i v i — {N 2 maxj Vi)e k , and thus e k > l/(N 2 max; Vi). We then have by 
definition of M that 

e k NM - W > AT0 NM - W > 0 . 

N z max* Vi 

Thus the payoff for the follower is strictly positive for the action f k , and this is thus preferred to /o, thus 
leading to payoff 0 to the leader. □ 

Since there is a strategy for the leader that obtains strictly positive payoff, we can thus assume that the 
strategy for the leader chooses an action uniformly at random in the root node, and the follower chooses 
action /q. Since /o is preferred by the follower to any other action this means that Yliej w > — W- an d the 
leader obtains payoff v > - Thus this corresponds exactly to a feasible solution to the Knapsack with 
UNIT-ITEMS instance of the same value. □ 


8 Appendix: Approximating Optimal Strategies 

In this section we provide the missing details for the algorithms that approximate the optimal strategies for 
the leader to commit to, for both the behavioral and pure case. 

THEOREM[6]( restated) There is an algorithm that takes as input a turn-based game on a tree with chance 
nodes and a parameter e, and computes a behavioral strategy for the leader. That strategy, combined with 
some best response of the follower, achieves a payoff that differs by at most efrom the payoff of the leader 
in a Stackelberg equilibrium in behavioral strategies. The algorithm runs in time 0(e —3 (UHt) 3 T), where 
U = maxo-o-/ ui(cr) — u\{a'), T is the size of the game tree and Ht is its height. 

We have provided the algorithm in the main body of the paper; its correctness and runtime will follow 
from the next lemma. 

Lemma 3 The algorithm of Theorem [6] is correct and has the desired runtime. 
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Proof: Recall we are given a turn-based game on a tree with chance nodes and parameter e, and the goal 
is to compute a behavioral strategy for the leader. We constructed the algorithm in Theorem[ 6 ]so that it uses 
dynamic programming to store a table of values for each node in the tree, i.e. a discretized representation of 
the possible tradeoffs between the utility that the leader can get and the utility that can simultaneously be 
offered to the follower. The crucial part for proving correctness is arguing that the cumulative error in the 
leader’s utility is bouned additively by the height of the tree. 

For clarity, we repeat the induction hypothesis here. For each sub-tree T, the table associated with it, 
At, has the following guarantee at each index k in the table: 

a) the leader has a strategy for the game tree T that offers the follower utility At [k] while securing utility 
at least k to the leader. 

b) no strategy of the leader can (starting from sub-tree T) offer the follower utility strictly more than 
At \k] , while securing utility at least k + Ht to the leader, where Hr is the height of the tree T. 

We now argue this holds for each type of node in the tree. Note that the base case holds trivially by 
construction, since it is associated with the leaves of the tree. 

Leader nodes 

Let T be a leader node, with successors L and R, each with tables Al and An. If the leader plays L 
with probability p and plays R with the remaining probability (1 — p), followed up by the strategies that 
gave the guarantees for An[i\ and A R [j] , then the leader would get an expected pi + (1 — p)j, while being 
able to offer pAr [z] + (1 — p) A r< [j] to the follower. For a given k, the optimal combination of the computed 
tradeoffs becomes: 


A T [k\ := ma x{pA L [i\ + (1 - p)A R \j] \ pi + (1 - p)j > k} 

*,j,p 

For part 1 of the induction hypothesis, the strategy that guarantees AT[k] simply combines the strategies 
for the maximizing Al[i\ and A R [j\ along with the probability p at node T. For a given i, j, and k, finding 
the optimal value p amounts to maximizing a linear function over an interval, i.e., it will attain its maximum 
at one of the end points of the interval. The table At can thus be filled in time 0(n 3 ) by looping over all 
0 < i, j , k < n, where n is the number of entries in each table. 

For part 2 of the induction hypothesis, assume for contradiction that some strategy cr yields utilities 
(u1 , u 2 ) with 

u1 > k + Ht and u 2 > At [k] (9) 

Let p cr be the probability that a assigns to the action L, and let {u [' L , u 2 ’ L ) and {u[ )R , u 2 R ) be the utilities 
from playing cr and the corresponding follower strategy in the left and right child respectively. By definition, 


Ui — p& U{ L + (1 - Pc r) • U{ R 

, We { 1 , 2 } 

(10) 

By the induction hypothesis, 



u 2 c < A c [[ul' c \ — H t + 1], 

Vc e {L,R} 

(ID 

Thus, 

- Per) ■ A R [lu°’ R \ - H t + 1] 


A T [k\ <u 2 <Pa ■ A L [[u a l ' L \ - H t + 1] + (1 - 

(12) 
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But 


Pa • (K’ L J - H t + 1) + (1 - Pu ) • (L<’ R J - At + 1) (13) 

> Per • ( u[’ L - H t ) + (1 - Pa) ■ {u°' R - H t ) (14) 

= u\ — Ht > k (15) 

meaning that i = \u[' L \ — Ht + 1 and j = [ui’ R — Ht + 1 satisfy the constraints in the definition of 
At[P], which contradicts the assumption that u% > At[ k ]. 

Chance nodes 

Let T be a chance node, with successors L and R, each with tables Al and A R , and let p be the 
probability that chance picks L. If the leader combines the strategies that gave the guarantees for A R [i] and 
Afi[j], then the leader would get an expected pi + (1 — p)j while being able to offer pA R [i\ + (1 — p)A R [j] 
to the follower. For a given k, the optimal combination of the computed tradeoffs becomes: 

A T [k\ := ma x{pA L \i\ + (1 - p)A R [j] \ pi + (1 - p)j > k} 

h3 

For part 1 of the induction hypothesis, the strategy that guarantees At [k] simply combines the strategies 
for the maximizing A R [i\ and A R [j\. The table At can thus be filled in time 0(n 3 ) by looping over all 
0 < i,j, k < n, and this can even be improved to 0{n 2 ) by a simple optimization. 

For part 2 of the induction hypothesis, assume for contradiction that some strategy a yields utilities 
(u1 , ) with 

Ui > k + Ht and > A R [k] (16) 

Let ( il [' L , u^ L ) and (u^’ R , uA, 11 ) be the utilities from playing a and the corresponding follower strategy in 
the left and right child respectively. By definition. 



u l = P • u i’ L + (! - P) ■ u ? R i VZG {1,2} 

(17) 

By the induction hypothesis, 



u a 2 C < A c [L<’ c j -H t + 1], Vc G {L, R} 

(18) 

Thus, 


A T [k] <U 2 <p- A L [[u a { L \ - H t + 1] + (1 - p) ■ A R [\u{' R \ - H T + 1] 

(19) 

But 


p ■ - h t +1) + (i -p) ■ (K’ R j - h t + i) 

(20) 


> p ■ {ul' L - Ht) + (1 - p) ■ {u[ ,R - H t ) 

(21) 


= Ui — Ht > k 

(22) 


meaning that i = \u[ ,L \ — Ht + 1 and j = \_u[' R \ — Ht + 1 satisfy the constraints in the definition of 
At[P], which contradicts the assumption that u?, > A'f[k], 


Follower Nodes 
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Let T be a follower node, with successors L and R, each with tables Ar and Ar, and let tl and tr be 
the min-max value for the follower in L and R respectively. If the leader combines the strategy for A r [z] in 
L with the minmax strategy for R, then the followers best response is L, iff Ar [z] > tr, and similarly it is 
R if ArU] > tr. Thus, if we let 

,_ ( x , if x > r 

T ' oo , otherwise 

then the optimal combination becomes 

A T [k\ := rna x(A L [k] i TR ,A R [k] | T J 


For part 1 of the induction hypothesis, the strategy that guarantees Ar[k] simply combines the strategies 
for the maximizing Ar [z] or Ar [j] in one branch, and playing minmax in the other. The table Ar can thus 
be filled in time 0{n). 

For part 2 of the induction hypothesis, let Hr be the height of the tree T. Suppose that some strategy a 
yields (uf, rd}) with 

uf > k + Hr and > Ar [k] (23) 

Assume wlog. that the follower plays L. Let (v'[' L , iCf L ) be the utilities from playing a and the correspond¬ 
ing follower strategy in the left child. Combined with the induction hypothesis, we get 

AR[k } < U^’ T = u!f L < Ar[[u[' L \ - Hr + 1] = i4 T [KJ — Hr + 1] (24) 

But this is a contradiction, since Ar [k] is monotonically decreasing in k and 

[uf J — Hr + 1 A Ui — Hr > k (25) 

From the arguments above, the induction hypothesis holds for all types of nodes, and can be computed 
in polynomial time in the size of the tree and the number of entries in the tables. 

To complete the proof of the theorem, let D = ^ be the initial scaling of the leaders utility. Each table 
for the nodes will now contain n = ^ = 11 lr entries. Given the tables for the successors, each table can 
be computed in time 0(n 3 ) = 0(e~ 3 (U Hr) 3 ). Since there are T tables in total, we get the desired running 
time. 

Let Ar be the table for the whole tree, and let i! = max{z | Ar\ i] > —oo}. The strategy associated with 
Ar[ i’] guarantees utility i’ to the leader, and the induction hypothesis directly gives us that no strategy of 
the leader can guarantee more than %’ + Hr. By dividing by the scaling factor D, we get that the strategy 
associated with Ar [z'] guarantees a value that is at most e lower than that of any other strategy. □ 

If we are only interested in commitment to pure strategies, a very similar scheme can be constructed— 
this was stated in Theorem [7] 

THEOREM [7 [(restated) There is an algorithm that takes as input a turn-based game on a tree with 
chance nodes and a parameter e, and computes a pure strategy for the leader. That strategy, combined 
with some best response of the follower, achieves a payoff that differs by at most e from the payoff of the 
leader in a Stackelberg equilibrium in pure strategies. The algorithm runs in time 0(e~ 2 (UHr) 2 T), where 
U = maXo-o-/ u\(o) — zzi(ct / ), T is the size of the game tree and Hr is its height. 
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In essence, the algorithm for Theorem[7]is the same, except leader nodes only consider p G {0,1}. The 
induction hypothesis is the same, except the quantifications are over pure strategies instead. We argue the 
correctness of this algorithm formally in the following lemma. 

Lemma 4 The algorithm for Theorem [7] is correct and has the desired runtime. 

Proof: We have the same construction and induction hypothesis as in Theorem[6] Let T be a leader node, 
with successors L and R, each with tables Aj j and Ar. If the leader plays L (or R), followed up by the 
strategies that gave the guarantees for A^k] (or ,4/,>[/>:]). then the expected leader utility would be k, while 
being able to offer A r, [A:] (or A /,> [k] resp.) to the follower. For a given k, the optimal combination of the 
computed tradeoffs becomes: 


At[ k\ := max{A c [z] | i > k A c G {L, i?}} 


For part 1 of the induction hypothesis we simply use the move that maximizes the expression combined 
with the strategies that guarantee Ai,[k] and ,4/,>[/;:] in the successors. The table At can thus be filled in time 
0(n). 

For part 2 of the induction hypothesis, assume for contradiction that some pure strategy it yields utilities 
(uf, U 2 ) with 


uf > k + Ht and lij > At [k} 

(26) 

Assume wlog. that it plays L at T, and let {u[' L , u^' L ) be the utilities from playing 7 r 
follower strategy in L. By definition, 

and the corresponding 

uf = VZ e {1,2} 

(27) 

By the induction hypothesis, 

uf L < A l [[u[' l \ - H t + 1] 

(28) 

Thus, 

A T [k] < u\ < A l [\u[ )L \ - H t + 1] 

(29) 

But 


— Ht + 1 > u\' L — Ht = u\ — Ht > k 

(30) 


meaning that i = [vf' L \ — Ht + 1 satisfies the constraints in the definition of At [A:], which contradicts the 
assumption that uf > At [k]. 

The proof for the other nodes is identical to those for mixed strategies. The modified induction hypoth¬ 
esis thus holds for all types of nodes, and can be computed in polynomial time in the size of the tree and 
the number of entries in the tables. The proof of Theorem[7]is very similar to that of Theorem[6] except the 
calculation has become slightly more efficient. This completes the proof. □ 
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